Geospatial Visualisations

This notebook will follow the concept of visualising information in a geospatial format. Essentially a heatmap of london with different prices. Presentation like the following image:

London

This would be an appealing method of conveying possible trends that align with the location of an area in particular and answer some of the questions in our brief in a more readable manner.

This would be done via the Folium library. This allows for geospatial visualisations and the creation of maps and handling different geospatial datastructures such as GeoJsons. A GeoJson is a special json file that can store different polygons that, when overlapped onto a world map, can form districts. They are lists of specific points which translate into shapes and hences these districts. In our use case we would need to either find or create a GeoJson of the London Outcodes we have collected.

Luckliy a user on github named radoi90 managed to create a pre-made GeoJson file for this exact purpose.

Geopandas Plotting

Using the Geopandas library, we can plot several different interesting visuals. The following section shows some examples provided that can be incorporated into the analysis.

The visualisations below cover single values that each postcode district/area contains. The type of plot is known as a Chloropleth. Each district has a value and a colour is assigned on a scale to the value, similar to a heatmap. This method allots the visualisation of the geospatial distribution.

Crime per Hectare - Average Houseprice

This comparison fails to provide any information surrounding the crime rate and price of housing. This could be due to the minimal amount of crimes done per hectare. One observation we can see is the central area of London containing somewhat higher figures than any other area. This is expected.

The areas with the lowest housing prices dont have the lowest crimerates according to the per hectare variables.

Folium Plotting

Although using these static visualisations is useful in observing the spread of values over a geographical area, it would be more useful to see the direct crime statistic plotted alongside the houseprice/rent. This can be done using folium where it provides a more interactive map library built upon javascript.

Controlling the colour of each district using a specific stylefunction mapper would be difficult as it requires looking at outdated documentation. One approach would be to directly calculate the chloropleth map using folium, meaning we have to write to a Geopandas using the cleaned data, then read it again and link it to a dataframe we have.

Now we have both datasources compiled, with the ID for the postcode being the ID on the geoJson and the data with the geometry removed, we can compile the maps and obtain a folium chloropleth map.

The entire analysis above was done with metrics pertaining to per-hectare crimes. This means it may be more representable of relationships/correlations. Performing the same calculations again with the actual count value of crimes may be more representable. The question of:

"Is there an association between rents and the level of crime in Greater London postcodes?"

Seems to be inconclusive. Currently there doesnt seem to be a relationship visible in any of the areas shown in the geospatial projections. The initial geopandas plot of crimes per-hectare against the average house price shows almost no relationship. This could due to many external factors effecting peoples decision making process when choosing an area to live. It would make logical sense that the level of crime being higher may make an area more undesireable to live in, but the opposite occurence may be true. A lower income area may introduce higher levels of crimes. This cannot be stated as conclusive though as the relationship is not fully modelled in this analysis.

Crime count Analysis

Following the above analysis, we will now plot every crime count alongside the average house price to see if there is any observable patterns. The total crime per hectare although useful as a summerative statistic, abstracts the smaller values and doesnt allow them to speak for themselves.

Following the visualisations abo